Mapping the Natural Language Processing Domain: Experiments using the ACL Anthology

نویسندگان

  • Elisa Omodei
  • Jean-Philippe Cointet
  • Thierry Poibeau
  • Maurice Arnoux
چکیده

This paper investigates the evolution of the computational linguistics domain through a quantitative analysis of the ACL Anthology (containing around 12,000 papers published between 1985 and 2008). Our approach combines complex system methods with natural language processing techniques. We reconstruct the socio-semantic landscape of the domain by inferring a co-authorship and a semantic network from the analysis of the corpus. First, keywords are extracted using a hybrid approach mixing linguistic patterns with statistical information. Then, the semantic network is built using a co-occurrence analysis of these keywords within the corpus. Combining temporal and network analysis techniques, we are able to examine the main evolutions of the field and the more active subfields over time. Lastly we propose a model to explore the mutual influence of the social and the semantic network over time, leading to a socio-semantic co-evolutionary system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The ACL Anthology Searchbench

We describe a novel application for structured search in scientific digital libraries. The ACL Anthology Searchbench is meant to become a publicly available research tool to query the content of the ACL Anthology. The application provides search in both its bibliographic metadata and semantically analyzed full textual content. By combining these two features, very efficient and focused queries ...

متن کامل

The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Antho...

متن کامل

Towards an ACL Anthology Corpus with Logical Document Structure. An Overview of the ACL 2012 Contributed Task

The ACL 2012 Contributed Task is a community effort aiming to provide the full ACL Anthology as a high-quality corpus with rich markup, following the TEI P5 guidelines— a new resource dubbed the ACL Anthology Corpus (AAC). The goal of the task is threefold: (a) to provide a shared resource for experimentation on scientific text; (b) to serve as a basis for advanced search over the ACL Anthology...

متن کامل

He Said, She Said: Gender in the ACL Anthology

Studies of gender balance in academic computer science are typically based on statistics on enrollment and graduation. Going beyond these coarse measures of gender participation, we conduct a fine-grained study of gender in the field of Natural Language Processing. We use topic models (Latent Dirichlet Allocation) to explore the research topics of men and women in the ACL Anthology Network. We ...

متن کامل

The ACL Anthology Network Corpus as a Resource for NLP-based Bibliometrics

The ACL Anthology Network (AAN) is another successful project built on top of the ACL Anthology. It was started in 2007 by our group (CLAIR) (Radev et al., 2009) at the University of Michigan. Table 1 shows some statistics of the current release of AAN. We convert the articles included in the ACL Anthology corpus (excluding book reviews) from PDF to text. This text is then processed to identify...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014